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SYSTEM AND METHOD FOR MONITORING 
LINK DELAYS AND FAULTS IN AN IP NETWORK 

TECHNICAL FIELD OF THE INVENTION 

[0001] The present invention is directed, in general, to 
computer network management and, more specifically, to a system and 
method for monitoring link delays and faults in an Internet 
Protocol (IP) network. 

BACKGROUND OF THE INVENTION 

[0002] The demand for sophisticated tools for monitoring network 
utilization and performance has been growing rapidly as Internet 
Service Providers (ISPs) offer their customers more services that 
require quality of service (QoS) guarantees and as ISP networks 
become increasingly complex. Tools for monitoring link delays and 
faults in an IP network are critical for numerous important network 
management tasks, including providing QoS guarantees to end 
applications (e.g., voice over IP), traffic engineering, ensuring 
service level agreement (SLA) compliance, fault and congestion 
detection, performance debugging, network operations and dynamic 
replica selection on the Web. Consequently, a recent flurry of 
both research and industrial activity has been focused on 

-1- 



developing novel tools and infrastructures for measuring network 
parameters . 

[0003] Existing network monitoring tools can be divided into two 
categories. The first category contains node-oriented tools for 
collecting monitoring information from network devices (routers, 
switches and hosts) using Simple Network Management Protocol /Remote 
MONitoring ( w SNMP/RMON" ) probe messages (see, Stallings, "SNMP, 
SNMPv2, SNMPv3, and RMON 1 and 2," Addison-Wesley Longman Inc., 
1999, (Third Edition) , incorporated herein by reference in its 
entirety) or the Cisco NetFlow tool (see, "NetFlow Services and 
Applications," Cisco Systems, 1999, incorporated herein by 
reference in its entirety) . These are useful for collecting 
statistical and billing information and for measuring the 
performance of individual network devices (e.g., link bandwidth 
usage) . However, in addition to requiring monitoring agents to be 
installed at every device, these tools cannot monitor network 
parameters that involve several components, such as link or end-to- 
end path latency. 

[0004] The second category contains path-oriented tools for 
connectivity and latency measurement, such as "ping," "traceroute" 

(see, e.g., Richard, "TCP/IP illustrated," Addison-Wesley 
Publishing Company, 1994 in its entirety) and "skitter" (see, e.g., 
Cooperative Association for Internet Data Analysis (CAIDA) , 
http://www.caida.org/), and tools for bandwidth measurement, such 
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as "pathchar," (see, e.g., Jacobsen, "Pathchar — A Tool to Infer 
Characteristics of Internet Paths," April 1997, 
ftp: /f tp.ee . lbl .gov/pathchar) , "Cprobe," (see, e.g., Carter, et 
al., "Server Selection Using Dynamic Path Characterization in Wide- 
Area Networks, " in Proceedings of IEEE INFOCOM'99, Kobe, Japan, 
April 1997) "Nettimer , " (see, e.g., Lai, et al . , "Measuring 
Bandwidth," in Proceedings of IEEE INFOCOM x 99, New York City, New 
York, March 1999) and "pathrate" (see, e.g., Dovrolis, et al . , 
"What Do Packet Dispersion Techniques Measure?," in Proceedings of 
IEEE INFOCOM '2001, Alaska, April 2001) . As an example, skitter 
sends a sequence of probe messages to a set of destinations and 
measures the latency of a link as the difference in the round-trip 
times of the two probe messages to the endpoints of the link. A 
benefit of path-oriented tools is that they do not require special 
monitoring agents to be run at each node. However, a node with 
such a path-oriented monitoring tool, termed a monitoring station, 
is able to measure latencies and monitor faults for only a limited 
set of links in the node's routing tree (e.g., shortest path tree) . 
Thus, monitoring stations need to be deployed at a few strategic 
points in the ISP or enterprise IP network so as to maximize 
network coverage while minimizing hardware and software 
infrastructure costs, as well as maintenance costs for the 
stations . 
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[0005] The need for low-overhead network monitoring has prompted 
development of new monitoring platforms. The IDmaps project {see, 
Francis, et al . , "An Architecture for a Global Internet Host 
Distance Estimation Service," in Proceedings of IEEE INFOCOM *99, 
New York City, New York, March 1999, incorporated herein by 
reference in its entirety) produces "latency maps" of the Internet 
using special measurement servers, called "tracers," that 
continually probe each other to determine their distance. These 
times are subsequently used to approximate the latency of arbitrary 
network paths. Different methods for distributing tracers in the 
Internet are described in Jamin, et al . , "On the Placement of 
Internet Instrumentation," in Proceedings of IEEE INFOCOM '2000, 
Tel Aviv, Israel, March 2000 (incorporated herein by reference in 
its entirety) , one of which is to place them such that the distance 
of each network node to the closest tracer is minimized. 
[0006] A drawback of the IDMaps approach is that latency 
measurements may not be sufficiently accurate. Due to the small 
number of paths actually monitored, it is possible for errors to be 
introduced when round- trip times between tracers are used to 
approximate arbitrary path latencies. 

[0007] Recently, Breitbart, et al . , "Ef f iciently Monitoring 
Bandwidth and Latency in IP Networks," in Proceedings of the IEEE 
INFOCOM '2000, Tel-Aviv, Israel, March 2000 (incorporated herein by 
reference in its entirety) , proposed a monitoring scheme where a 

-4- 



single network operations center (NOC) performs all the required 
measurements. To monitor links not in its routing tree, the NOC 
uses the IP source routing option to explicitly route probe packets 
along the link. Unfortunately, due to security problems, many 
routers frequently disable the IP source routing option. 
Consequently, approaches that rely on explicitly routed probe 
messages for delay and fault monitoring are not feasible in many of 
today's ISP and enterprise environments. 

[0008] In other recent work on monitoring, Shavitt, et al . , 
''Computing the Unmeasured: An Algebraic Approach to Internet 
Mapping," in Proceedings of IEEE INFOCOM 2 001, Alaska, April 2001, 
incorporated herein by reference in its entirety, proposes to solve 
a linear system of equations to compute delays for smaller path 
segments from a given a set of end-to-end delay measurements for 
paths in the network. Similarly, Bu, et al . , "Network Tomography 
on General Topologies," in Proceedings of the ACM SIGMETRICS, June 
2002 (incorporated herein by reference in its entirety) considers 
the problem of inferring link- level loss rates and delays from end- 
to-end multicast measurements for a given collection of trees. 
Finally, Dilman, et al . , "Efficient Reactive Monitoring," in 
Proceedings of the IEEE INFOCOM '2001, Alaska, April 2001 
(incorporated herein by reference in its entirety) studies ways to 
minimize the monitoring communication overhead for detecting alarm 
conditions due to threshold violations. 
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[0009] Reddy, et al . , "Fault Isolation in Multicast Trees," in 
Proceedings of the ACM SIGCOMM, 2000 and Adler, et al . , "Tree 
Layout for Internal Network Characterizations in Multicast 
Networks," in Proceedings of NGC x 01, London, UK, November 2001 
(both incorporated herein by reference in its entirety) , consider 
the problem of fault isolation in the context of large multicast 
distribution trees. The schemes in Reddy, et al . , supra, achieve 
some efficiency by having each receiver monitor only a portion of 
the path (in the tree) between it and the source, but require 
receivers to have some monitoring capability (e.g., the ability to 
do multicast traceroute) . 

[0010] Adler, et al . , supra, focuses on the problem of 
determining the minimum cost set of multicast trees that cover 
links of interest in a network. Unfortunately, Adler, et al., 
supra, does not consider network failures or issues such as 
minimizing the monitoring overhead due to probe messages. Also, 
Adler, et al . , supra, covers only links and not the problem of 
selecting the minimum number of monitoring stations whose routing 
trees cover links of interest; routing trees usually are more 
constrained (e.g., shortest path trees) than multicast trees. 

[0011] Most of the systems for monitoring IP networks described 
above suffer from three major drawbacks. First, the systems do not 
guarantee that all links of interest in the network are monitored, 
especially in the presence of network failures. Second, the 

-6- 



systems have limited support for accurately pinpointing the 
location of a fault when a network link fails. Finally, the 
systems pay little or no attention to minimizing the overhead (due 
to additional probe messages) imposed by monitoring on the 
underlying production network. Accordingly, what is needed in the 
art is a system that fully and efficiently monitors link latencies 
and faults in an IP network using path-oriented tools. 
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SUMMARY OF THE INVENTION 



[0012] Disclosed and claimed herein is a novel two-phased 
technique for fully and efficiently monitoring link latencies and 
faults in an ISP or Enterprise IP network, using path-oriented 
tools. Some embodiments of the technique are failure-resilient, 
and ensure complete coverage of measurements by selecting 
monitoring stations such that each network link is always in the 
routing tree of some station. The technique also reduces the 
monitoring overhead (cost), which consists of two cost components: 
(1) the infrastructure and maintenance costs associated with 
monitoring stations and (2) the additional network traffic due to 
probe messages. Minimizing the latter is especially important when 
information is collected frequently (e.g., every 15 minutes) to 
monitor continually the state and evolution of the network. 
[0013] To address the above-discussed deficiencies of the prior 
art, the present invention provides a system for, and method of, 
monitoring link delays and faults in an IP network. In one 
embodiment, the system includes: (1) a monitoring station 
identifier that computes a set of monitoring stations that covers 
links in at least a portion of the network and (2) a probe message 
identifier, coupled to the monitoring station identifier, that 
computes a set of probe messages to be transmitted by at least ones 
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of the set of monitoring stations such that the delays and faults 
can be determined. 

[0014] In the first phase of one embodiment of the technique, a 
minimal set of monitoring stations (and their locations) that 
always cover all links in the network is computed, even if some 
links were to fail. In the second, subsequent phase of the one 
embodiment, the minimal set of probe messages transmitted by each 
station is computed such that the latency of every network link can 
be measured and every link failure can be detected. The following 
topics related to the novel technique will be addressed herein. 
[0015] (1) Novel algorithms for station selection. The problem 

of computing the minimal set of stations whose routing trees cover 
all network links is NP-hard. The station selection problem maps 
to the known set cover problem (see, Chavatal, "A Greedy Algorithm 
for the Set-Covering Problem," Math, of Operation Research, Vol. 4, 
No. 3, pp 233-235, 1979, incorporated herein by reference) . Thus, 
a polynomial -time greedy algorithm yields a solution that is within 
a logarithmic factor of the optimal. Further, the logarithmic 
factor is a lower bound on the degree of approximation achievable 
by any algorithm. 

[0016] (2) Novel algorithms for probe message assignment. The 

problem of computing the optimal set of probe messages for 
measuring the latency of network links is also NP-hard. A 
polynomial -time greedy algorithm will be disclosed that computes a 
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set of probe messages whose cost is within a factor of two of the 
optimal solution. Again, this approximation factor is quite close 
to the best possible approximation result. 

[0017] The foregoing has outlined, rather broadly, preferred and 
alternative features of the present invention so that those skilled 
in the art may better understand the detailed description of the 
invention that follows. Additional features of the invention will 
be described hereinafter that form the subject of the claims of the 
invention. Those skilled in the art should appreciate that they 
can readily use the disclosed conception and specific embodiment as 
a basis for designing or modifying other structures for carrying 
out the same purposes of the present invention. Those skilled in 
the art should also realize that such equivalent constructions do 
not depart from the spirit and scope of the invention in its 
broadest form. 
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BRIEF DESCRIPTION OF THE DRAWINGS 



[0018] For a more complete understanding of the present 
invention, reference is now made to the following descriptions 
taken in conjunction with the accompanying drawings, in which: 
[0019] FIGURE 1 illustrates a system for monitoring link delays 
and faults in an IP network constructed according to the principles 
of the present invention; 

[0020] FIGURE 2 illustrates a method of monitoring link delays 
and faults in an IP network carried out according to the principles 
of the present invention; 

[0021] FIGURE 3 illustrates a pseudocode listing of a greedy SC 
algorithm constructed according to the principles of the present 
invention; 

[0022] FIGURE 4A-4D together illustrate the effect of a link 
failure on a monitoring system constructed according to the 
principles of the present invention; and 

[0023] FIGURE 5 illustrates a pseudocode listing of a greedy 
PMSC algorithm constructed according to the principles of the 
present invention. 
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DETAILED DESCRIPTION 



[0024] Network Model 

[0025] An undirected graph G(V,E) is used to model a service 
provider or enterprise IP network, where the graph nodes, V, denote 
the network routers and the graph edges, E, represent the 
communication links connecting them. 

[0026] | V\ and |e| denote the number of nodes and edges, 

respectively, in the undirected graph G{V,E) . Further, P s t denotes 
the path traversed by an IP packet from a source node to a 
destination node t. For purposes of the model, it is assumed that 
packets are forwarded through the network using standard IP 
forwarding techniques; that is, each node relies exclusively on the 
destination address in the packet to determine the next hop. The 
result of this assumption is that, for every node x e P Sttf P Xit is 
included in P Sft . In addition, P s t is also assumed to be the 
routing path in the opposite direction from node t to node s. This 
in turn implies that, for every node x e P Sit , P StX is a prefix of 
Ps, t • 

[0027] As a consequence, it follows that for every node s e V, 
the subgraph obtained by merging all the paths P 8tt for every t e 
V, must have a tree topology. This tree for node s is referred to 
as the "routing tree" (RT) for s, denoted by T s . Note that tree T s 
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defines the routing paths from node s to all the other nodes in V, 
and vice versa. 

[0028] For a service provider network consisting of a single 
OSPF area, the RT T s of node s is its shortest path tree. However, 
for networks consisting of multiple OSPF areas or autonomous 
systems (that exchange routing information using the Border Gateway 
Protocol, or BGP) , packets between nodes may not necessarily follow 
shortest paths. In practice, the topology of RTs can be calculated 
by querying the routing tables of nodes. 

[0029] In case a link f in the network fails, the IP routing 
protocols define a new delivery tree, T s f/ for every node s 6 V. 
(This typically takes a few seconds to a few tens of seconds 
depending on the IP routing protocol parameter settings.) The new 
tree T s f has the property that every path P Stt in T s , t e V, that 
does not contain link f is also included in the tree T c f . The 
reason for this is that the failure of link f only affects those 
routing paths in T s that contain f. Thus, it may be possible to 
infer the topology of a significant portion of T s f directly from T s 
without any knowledge of the route computation algorithms followed 
by the routers. Note that if f f T s , then T s f = T s . 
[0030] For purposes of the model, a positive cost c st is 
associated with sending a message along the path P 8t between any 
pair of nodes s,t 6 V. For every intermediate node x e P 8 t , both 
c sx and c Xtt are at most c 3 t and c s x +c x t ^ c s>t . Typical examples of 
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this cost model are the fixed cost model, where all messages have 
the same cost, and the hop count model, where the message cost is 
the number of hops in its route. For this reason, h s t denotes the 
number of hops in path P Sft . 
[0031] Delay Monitoring Framework 

[0032] The methodology for complete measurement of round-trip 
latency of network links in an IP network will now be described. 
The methodology can also be extended to detect link failures as 
well as be resilient to them; these extensions will be described in 
more detail below. 

[0033] For monitoring the round-trip delay of a link e e E, a 
node s E V t such that e belongs to s's RT (i.e., e E T s ) , must be 
selected as a monitoring station. Node s sends two probe messages 
(which may be implemented by using "ICMP ECHO REQUEST/REPLY" 
messages similar to ping) to the end-points of e, which travel 
almost identical routes, except for the link e. Upon receiving a 
probe message, the receiver replies immediately by sending a probe 
reply message to the monitoring station. Thus, the monitoring 
station s can calculate the round-trip delay of the link by 
measuring the difference in the round-trip times of the two probe 
messages (see also, skitter, described in CAIDA, supra) . 
[0034] From the above description, it follows that a monitoring 
station can only measure the delays of links in its RT. 
Consequently, a monitoring system designated for measuring the 
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delays of all network links has to find a set of "monitoring 
stations" S c V and a "probe message assignment" JL c {m(s,u) \ s 6 S, 
u e V} , where each probe message m(s,u) represents a probe message 
that is sent from the monitoring station s to node u. The set S 
and the probe message assignment Jt are required to satisfy two 
constraints : 

(1) a "covering set" constraint that guarantees that all 
links are covered by the RTs of the nodes in S, i.e., 
VsesT s = £/ and 

(2) a "covering assignment" constraint that ensures that for 
every edge e = (u,v) e E, a node s E S exists such that 
e e T s and Jl contains the messages m(s,u) and m(s,v). 
(If one of the endpoints of e = (u,v) is in S, say u 6 S, 
then J? is only required to contain the message m{u,v) .) 

[0035] The covering assignment constraint essentially ensures 
that every link is monitored by some stations. A pair [S,A) that 
satisfies the above constraints is referred to as a "feasible 
solution." Note that although only the problem of monitoring all 
network links is described herein, the results also apply to the 
problem of monitoring only a subset of links of interest. 
[0036] As stated above, the overhead (cost) of a monitoring 
system consists of two cost components, the overhead of installing 
and maintaining the monitoring stations, and the communication cost 
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of sending probe messages. In practice, it is preferable to have 
as few stations as possible since this reduces operational costs. 
Therefore, a two-phased approach is adopted to optimizing 
monitoring overheads. In the first phase, an optimal set of 
monitoring stations is selected, while in the second, the optimal 
probe messages are computed for the selected stations. An "optimal 
station selection" S is one that satisfies the covering set 
requirement while simultaneously minimizing the number of stations. 
After selecting the monitoring stations S, an "optimal probe 
message assignment" jl is one that satisfies the covering assignment 

constraint and minimizes the sum / J , v c e „ . Note that choosing 

c StV = 1 essentially results in an assignment Jl with the minimum 
number of probe messages while choosing c StV to be the hop distance, 
h s,vi yields a set of probe messages that traverses the fewest 
possible network links. 

[0037] A final component of the monitoring infrastructure 
described herein is the network operations center (NOC) , which is 
responsible for coordinating the actions of the set of monitoring 
stations S. The NOC queries network nodes to determine their RTs, 
and subsequently uses these to compute a near-optimal set of 
stations and a probe message assignment for them, as described 
below. 
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[0038] Referring initially to FIGURE 1, illustrated is a system, 
generally designated 110, for monitoring link delays and faults in 
an IP network 100. The system 110 is associated with a NOC (not 
referenced) of the network 100 and includes a monitoring station 
identifier 120. The monitoring station identifier 120 computes a 
set of monitoring stations in the network 100 that covers links in 
at least a portion of the network 100. In the embodiment 
illustrated in FIGURE 1, the monitoring station identifier 120 
computes a minimal set of monitoring stations that covers an 
entirety of the links in the network 100. Because computing the 
minimal set of monitoring stations is an NP-hard problem, the 
monitoring station identifier 120 preferably employs polynomial - 
time approximation algorithms (to be described below) to compute 
the set of monitoring stations. 

[0039] The system 110 further includes a probe message 
identifier 130. The probe message identifier 130 is associated 
with the monitoring station identifier 120. The probe message 
identifier 13 0 computes a set of probe messages to be transmitted 
by at least ones of the set of monitoring stations such that the 
delays and faults can be determined. In the embodiment illustrated 
in FIGURE 1, the probe message identifier 13 0 computes a minimal 
set of monitoring stations. 

[0040] As with computing the minimal set of monitoring stations, 
computing the minimal set of probe messages is an NP-hard problem. 
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Therefore, the monitoring station identifier 12 0 preferably employs 
polynomial -time approximation algorithms (also to be described 
below) to compute the set of monitoring stations. 

[0041] The "cost" of the minimal set of probe messages is 
determined by the cost of sending the probe messages through the 
network. The probe messages can be assumed to have identical 
message costs, message costs that are based on a number of hops to 
be made by the probe messages or any other basis as a particular 
application may find advantageous. 

[0042] FIGURE 2 illustrates a method, generally designated 200, 
of monitoring link delays and faults in an IP network carried out 
according to the principles of the present invention. The method 
200 begins in a start step 210, wherein it is desired to monitor 
the IP network in an efficient and thorough manner. 

[0043] The method proceeds to a step 220, in which a set of 
monitoring stations that covers links in at least a portion of the 
network is computed. As stated above, the set of monitoring 
stations is preferably a minimal set of stations having associated 
routing trees that cover all considered sets of monitored links. 

[0044] The method 200 then proceeds to a step 23 0, in which a 
set of probe messages to be transmitted by at least ones of the set 
of monitoring stations is computed. Again, the set of probe 
messages is preferably a minimal cost set, and since the problem to 
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be solved is NP-hard, an approximation algorithm that provides a 
near-optimal solution is presented herein. 

[0045] Next, the various monitoring stations in the computed set 
of monitoring stations are configured for operation in a step 240. 
The monitoring stations are assigned probe messages to transmit so 
they can monitor their various assigned links such that delays and 
faults can be determined. The method ends in an end step 250. 
[0046] Delay Monitoring Algorithms 

[0047] Polynomial -time approximation algorithms will now be 
presented for solving the station selection and probe message 
assignment problems in the context of a scenario that does not 
consider network link failures. A In ( | V\ ) -approximation algorithm 
for station selection and a 2 -approximation algorithm for probe 
message assignment will be set forth. 

[0048] Of great interest is solving the problem of covering all 
graph edges with a small number of RTs . The "Link Monitoring 
Problem," or LM, is therefore defined as follows. Given a graph 
G(V,E) and an RT, T v , for every node v e V, find the smallest 
subset S c V such that U^TV = E. 

[0049] For the clarity of presentation, only the unweighted 
version of the LM problem is considered. However, the results can 
easily be extended to the weighted version of the problem, where 
each node has an associated cost, and a set S c V that minimizes 
the total cost of the monitoring stations in S is sought. The 
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latter can be used, for instance, to find a station selection when 
monitoring stations can be installed only at a restricted set of 
nodes. For restricting the station selection, nodes that cannot 
support monitoring stations are assigned infinite cost. 
[0050] The LM problem is similar to the set cover (SC) problem, 
which is a well-known NP-hard problem. In an instance I(Z,Q) of 
the SC problem, Z = {z 1 ,z 2 , . . . z m ) is a universe of m elements and Q 
= {QnQ 2 i • ■ -On) is a collection of n subsets of Z (assume that l^Q 
= Z) . The SC problem seeks to find the smallest collection of 
subsets S c Q, such that their union contains all the elements in Z, 
i.e., U^Q = Z. 

[0051] Without setting forth the proof herein, the LM problem is 
NP-hard, even when the RT of each node is restricted to be its 
shortest path tree. Further, the lower bound of any approximation 
algorithm for the LM problem is #ln(|v|) . 

[0052] An efficient algorithm for solving the LM problem will 
now be described. The algorithm maps the given instance of the LM 
problem, involving graph G(V,E) , to an instance of the SC problem, 
and then uses a greedy algorithm for solving the SC instance. In 
the mapping, the set of edges E defines the universe of elements Z, 
and the collection of sets (^includes the subsets Q v = {e|e e T v } 
for every node v e V. Pseudocode for the greedy SC algorithm, 
depicted in FIGURE 3, is an iterative algorithm that selects, in 
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each iteration, the set Q with the maximum number of uncovered 
elements . 

[0053] According to Chavatal, supra, the greedy algorithm is a 
(ln(A) +1) -approximation algorithm for the SC problem, where A is 
the size of the largest subset. Thus, since for the LM problem, 
every subset includes all the edges of the corresponding RT and its 
size is exactly |v|-l, the following result is obtained. 
[0054] It will be stated without proof that the greedy algorithm 
computes a (In (A) +1) -approximation for the LM problem. Note that 
the worst-case time complexity of the greedy algorithm can be shown 
to be 0( | V| 3 ) . 

[0055] Once a set S of monitoring stations has been selected, a 
probe message assignment JL for measuring the latency of network 
links should be computed. As stated above, a w feasible probe 
message assignment" is a set of probe messages {m{s,u) \s 6 S, u e 
V} , where each m(s,u) represents a probe message that is sent from 
station s to node u. Further, for every edge e = (u,v) 6 E, a 
station s 6 S exists such that e e T s and A contains the probe 
messages m(s,u) and m{s,v) . (If s is one of the edge endpoints, 
say node v, then the probe message m(s,v) is omitted from JL.) The 

cost of a probe message assignment A is C0ST„ = S m(j ,„) M c (^ w ) and 
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the "optimal probe message assignment" is the one with the minimum 
cost . 

[0056] As with the LM problem, computing the optimal probe 
message assignment is NP-hard. 

[0057] A simple probe message assignment algorithm that computes 
an assignment whose cost is within a factor of two of the optimal 
solution will now be described. Consider a set of monitoring 
stations S and for every edge e 6 E, let S e = { s \ s e S A e e T s } be 
the set of stations that can monitor e. For each edge e = (u,v) E 
E, the algorithm selects as the monitoring station of e the node s e 
E S e for which the cost c s . >u +c s . /V is minimum. In case of ties 
(occurring when multiple stations have the same cost) , the 
algorithm selects s e to be the station with the minimum identifier 
among the tied stations. Then, it adds the probe messages m(s e ,u) 
and m(s et v) to JL. It will be stated without proof that the 
approximation ratio of the simple probe message assignment 
algorithm is two. 

[0058] Delay Monitoring with Fault Detection 

[0059] The probe message-based delay monitoring system described 
in the previous section can be extended to also detect network link 
failures. Using probe messages to pinpoint network faults has 
several advantages over monitoring routing protocol messages (e.g., 
OSPF LSAs) or using SNMP traps to identify failed links. First, 
probe message-based techniques are routing protocol agnostic; as a 

-22- 



result, they can be used with a range of protocols, such as Open 
Shortest Path First (OSPF) , Intermediate System to Intermediate 
System (IS-IS) or Routing Information Protocol (RIP) . Second, SNMP 
trap messages may be somewhat unreliable, since they are 
transported using Unreliable Data Protocol (UDP) datagrams. The 
probe message-based fault detection algorithms described herein can 
be used either stand-alone, or in conjunction with SNMP traps, to 
build a robust infrastructure for accurately pinpointing network 
faults . 

[0060] The probe message-based methodology set forth above, 
while suitable for estimating link delays, can, with modification 
be able to identify failed network links. 

[0061] Example 1: Consider the graph G(V,E) depicted in FIGURE 
4A, where each link is labeled with its weight. 

[0062] Let S = {s x ,s 2 } be the set of selected monitoring 
stations. The RTs T S1 and T S2 are the shortest path trees rooted at 
nodes a x and s 2 as illustrated in FIGURES 4B and 4C, respectively. 
The simple probe message assignment algorithm assigns all graph 
links to be monitored by s 1 except the links (s 2 ,a) and (c,d) which 
are monitored by s 2 . Note that s 2 transmits two probe messages 
m{s 2f c) and m(s 2/ d) that traverse nodes a, b and c to measure the 
latency of link (c,d). Now, consider the failure of link (a,Jb) 
that causes the RTs of s x and s 2 to be modified as shown in FIGURE 
4D. (RTs typically adapt to failures in a few seconds or a few 
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tens of seconds, depending on IP routing protocol parameter 
settings.) Specifically, the new RT for s 1 contains the link (s 2 ,a) 
instead of (a,Jb), while the tree for s 2 contains the links {s lf b) 
and (y, d) instead of (a,b) and (c,d). Clearly, neither s 1 nor s 2 
detects the failure of link (a,Jb) , and further since the probe 
messages m(s 2/ c) and m(s 2 ,d) traverse a diverse set of nodes ({s^b} 
and {s lf x,y} , respectively) , they are no longer measure the latency 
of link (c, d) . 

[0063] The delay monitoring framework will now be extended also 
to detect link failures. The fault monitoring infrastructure 
described herein uses the same set of stations S used for delay 
monitoring thus, the stations in S cover all links in the 

network. However, as shown above in Example 1, a new set of probe 
messages is required for identifying failed links (in addition to 
measuring link delays) . 

[0064] To develop the new set of probe messages, an algorithm 
for computing a near-optimal set of probe messages for detecting 
the failure of a network link will first be described. Then, an 
algorithm for accurately pinpointing the faulty link based on probe 
message results from the various stations will be described. The 
algorithms can easily be extended to handle node failures in 
addition. 

[0065] Detecting a Network Link Failure 
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[0066] In fault monitoring methodology described herein, probe 
messages use the time-to-live (TTL) field of the IP header. The 
TTL field provides an efficient way to bound the number of hops 
that an IP packet traverses in its path (see, Richard, supra) . 
Essentially, the TTL field of an IP packet is initialized by the 
source node to the maximum number of hops that the packet is 
allowed to traverse. Each router along the path followed by the 
packet then decrements the TTL value by one until it reaches zero. 
When a router decrements the TTL field to zero, it discards the 
packet and sends an ICMP "time expired" reply to the packet's 
source node. (This method is used by the traceroute application 
for discovering routers in the path between a given pair of nodes.) 
[0067] Now suppose that link e = (u,v) e T s should be monitored 
by station s e S, and let node v be the node that is further from 
s. The appropriate strategy is to set the TTL values in the two 
probe messages m{s,u) and m(s,v) that measure link e's delay such 
that the probe messages can also detect changes in T s due to e's 
failure. One option is to set the TTL field of probe message 
m{s,v) to h StV , the hop distance in T s between nodes s and v. This 
guarantees that the probe message does not traverse more than h SiV 
hops. Thus, a reply message from a node other than v indicates a 
link failure along the path P StV in T s . While this observation 
enables some failures to be detected, others may be missed. 
Example 2 illustrates. 



[0068] Example 2: Consider the graph in Example 1, and assume a 
failure of the link (a,b) monitored by s x . The hop distances from 
s 2 to nodes a, b before the failure are h slib - 1 and h sia - 2, and 
they remain the same after (a,b) fails. Thus, s 1 cannot detect the 
failure of link (a,b) . 

[0069] The above problem can be fixed by associating the same 
destination address with the two probe messages for link e = (u,v) . 
Let m{s,t,h) denote the probe message sent by source s to a 
destination node t with TTL value set to h. Further, let R miSit ,h) be 
the node that replies to the probe message. Assuming that node u 
is closer to station s than node v, i.e., h su < h svf station s can 
monitor both the delay as well as failure of link e by sending the 
following two probe messages: m{s,v t h su ) and m(s,v t h 8 V ) . These 
messages have the same destination, v, but they are sent by s with 
different TTL values. 

[0070] Clearly, in the absence of failures, the reply for the 
first message is sent by node u while the reply for the second 
message is sent by node v, i.e., m{s,v,h BU ) = u and m{s f v, h sv ) = v. 
Thus, the difference in the round-trip times of the two probe 
messages gives link e's delay. Further, if link e fails, and since 
u and v are no longer adjacent in the new RT T s e for s, at least 
one of these replies will be originated by a different node. This 
means that either R miSt Vfh . tU) * u or R miSi Vih . iV) * v, or both. 
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[0071] Thus, the probe message assignment A essentially 
contains, for each edge e = (u,v) , the probe messages m (s e/ v, h s . u ) 
and m(s ef v f h s . tV ) f where s e 6 S e is the station for which c 3 . tU +c s . fV is 
minimum (ties are broken in favor of stations with smaller 
identifiers). Further, if for e = (u,v), R M8 . tVth „. u) * u or R w{s . tVth ... V ) 
* v, then station s e informs the NOC that a network link has 
failed. 

[0072] Revisiting Example 2, s x sends probe messages m(s lt a f l) 
and m{s lf a,2) to monitor link (a,b) . In case link (a,Jb) fails, 
then with the new RT for s lt R mi8Xtatl) would be s 2 instead of b. 
Therefore, s 1 would detect the failure of link (a,b) . The probe 
message assignment A described above thus monitors all network 
links for both delay as well as faults. Further, it will be stated 
without proof that the cost of probe message assignment A is within 
a factor of two of the optimal probe message assignment. 
[0073] Let P s e denote the path between station s and link e (and 
including e) in T s . While probe message assignment J? ensures that 
any failure of a link e is detected by the station s that monitors 
e, s may not always be able to infer (by itself) whether the faulty 
link is e or some other link in P StG . The reason for this is that 
replies for the probe messages for link e may be affected by the 
failure of a link f in P s ef and f may not be monitored by s. Thus, 
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s may be unable to conclude whether the erroneous replies to e's 
probe messages were caused by the failure of e or f . 

[0074] Example 3: Consider the graph in Example 1 where station 
s 2 monitors links (s 2 ,a) and (c,d), and s 1 monitors the remaining 
links. Suppose that for probe message m{s 2 ,d,3) that monitors link 

(c,d), R mis2ldt 3) is y instead of c. This could be the result of the 
failure of either link (c,d) or {a,b) . In both cases, the new 
routing path from s 2 to d traverses nodes s lt x and y. Since s 2 
does not monitor link (a,b) , it cannot conclude, by itself, which 
of links (a,jb) or (c,d) have failed. 

[0075] An algorithm for accurately pinpointing the faulty link, 
in which each monitoring station sends its failure information to 
the NOC, will now be presented. 

[0076] As shown in Example 3, with probe message assignment J?, 
when a link fails, the station monitoring the link detects the 
failure, but may not always be able to accurately identify the 
location of the failed link. However, station s can narrow down 
the possible candidates to links in the path connecting it to the 
failed link. 

[0077] It will be stated without proof that, if R m{Si v<h .. u) * u or 
R m( S ,v,h<.*) * v for link (u,v) monitored by station s, a failure has 
occurred along the path P StV in T s . It will now be shown how this 
fact can be used to identify the faulty link after a monitoring 
station s detects a failure. A single link failure is assumed, 
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since the likelihood of multiple concurrent link failures within a 
short time interval is very small. 

[0078] Let E s q E be the set of links monitored by a station s 
E S as a result of probe message assignment J?. Since a link 
failure may affect the paths of multiple probe messages sent by s, 
F s c E s denotes the set of links (u,v) e E s for which R m{SfVth .. a) * u or 
R m( Sf v,h..y) * v - When a station s concludes a failure, the station s 
computes the set F s , and then finds the link f s 6 F s closest to s. 
Thus, no other link e e F s is contained in the path P Sff .. As 
concluded previously, the faulty link must be included in the path 

[0079] Clearly, if all the links in path P S/f . are in E s , then f s 
must be the failed link, since none of the other links in P S/f . are 
in F s . However, it is possible that some links in P Btfm are not in 
E s (in Example 3, link (a,b) is not in P S2 , (c , d) ). Thus, s cannot 
conclude that f s is the failed link, since a link in P s>f -E s may have 
failed. One option is for s to send additional probe messages, 
m(s,v,h s u ) and m(s, v, h BtV ) for monitoring every link (u,v) E P 8 f .-E 8 . 
Station s can then declare the faulty link to be the link (u',v') 
closest to it in P s f -E s for which R m ( SlV ,h..*) * u ' or R m( St v,h..*) * v ' • 
With this technique, s sends 0(|v|) additional messages in the 
worst case. Further, since the faulty link may be detected by 
multiple monitoring stations, the total number of extra messages is 
0(|S|-|V|) . 



[0080] A centralized approach will now be presented for 
identifying the faulty link at the system NOC, without sending 
additional probe messages. In the NOC-based approach, each station 
s that detects a failure transmits to the NOC a "FAULT DETECTED" 
message containing the identity of link f s . When the NOC receives 
the message, the NOC calculates the set C s of the potentially 
failed links detected by s as : 

C s = (P Stf .-E s )U{f s } 
Note that the NOC may receive a FAULT DETECTED message from more 
than one station, and for these stations C s is non-empty. For the 
remaining stations, C s = 0. Once the NOC has received the FAULT 
DETECTED message from all stations that detected a failure, it 
computes the identity of the faulty link by evaluating the 
following expression: 

C t *0 C s =0 

In Equation (1) , the second term prunes from the candidate set 
links that are monitored by stations that did not detect failures. 
[0081] Example 4: Consider the graph in Example 1 where station 
s 2 monitors links (s 2 ,a) and (c,d), and s 1 monitors the remaining 
links. Suppose that link (a,Jb) fails. Both s 1 and s 2 detect a 
failure, and send to the NOC FAULT DETECTED messages containing 
links (a,Jb) and (c,d), respectively. The NOC calculates the sets 
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C Si = {(a,b)} and C Sj = { (a, b) , (Jb, c) , (c, d) } , and computes the set C 
= C,nC, = {(a,b)}, that contains only the failed link (a,b) . It 

x a 

will be stated without proof that set C in Equation (1) contains 
only the faulty link f . 
[0082] Robust Link Monitoring 

[0083] A system for monitoring links is "robust" if it continues 
to monitor network links even if one or more links in the network 
fail. A key challenge in designing such a robust monitoring system 
is selecting a set of stations whose RTs always cover all the 
active network links. The problem is that when a link f fails, the 
new RT T Sif for a station s may be different from T s , the RT prior 
to the failure. As a result, a station s that was responsible for 
monitoring a link e e T s may be unable to monitor the link once 
link f fails. The problem is further complicated by the fact that 
the RTs T Stf for the various stations may not be known in advance 
(when stations are selected) . 

[0084] As an example, consider the graph in Example 1 with RTs 
T c and T, as shown in FIGURES 4B and 4C, respectively. The failure 

S x S J 

of link f = (a,Jb) causes the RTs of s 1 and s 2 to be modified as 
shown in FIGURE 4D. Clearly, link (c,d) e T Sj can no longer be 
monitored after f fails, since (c,d) £ T a ^ tf . Thus, the monitoring 
system (with stations s x and s 2 ) in FIGURE 4A is not robust. 
[0085] Now, the problem of efficient placement of monitoring 
stations that guarantees delay and fault monitoring of all active 
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links in the presence of at most K-l failures will be considered. 
This problem is referred to as the "K-Fault Resilient Monitoring" 
(JC-FRM) problem, for which a solution with an approximation ratio 
of ln(|E|) will be developed. Once the set S of stations is 
computed, the probe message assignment is computed as described 
above. For simplicity, only link failures will be considered; 
however, the general approach can be easily enhanced to support 
nodes failures as well. 

[0086] A set S of stations is resilient to one fault if and only 
if the set s satisfies the following "fault resilience" property: 
For every link f 6 E, for every other link e * f , e 6 T Stf for some 
station s e S. The fault resilience property ensures that, when an 
arbitrary link fails, every other active link is contained in the 
new RT of some station in S. However, finding a set of stations 
that satisfies the property may be difficult since the trees T sf 
may not be known in advance. Further, the property becomes 
extremely complex when K-fault resilience is considered, since any 
combination of K-l links can potentially fail. 

[0087] Due to the above-mentioned reasons, S is instead required 
to satisfy a stronger but simpler condition that implies the above 
fault resilience property. The condition does not rely on the 
knowledge of T 8f , but exploits the fact that T s and T sf are 
identical with respect to paths that do not contain the failed link 
f. Let F e (T s ) be the parent link of link e in T s . Then the 
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stronger condition is based on the key observation that S is 
resilient to a single link failure if one of the following two 
conditions holds for every link e e E: 

(1) one of e' s endpoints is in S, or 

(2) link e is in the RTs of at least two monitoring stations 
s lf 3 2 £ S, and F e {T 8l ) * F e (T S2 ) . 

[0088] The following proposition presents a more general 
sufficient condition for any JC-fault resilient monitoring system: 
a set S of monitoring stations is K- fault resilient if for every 
link e = (u,v) e E, at least one of the following conditions is 
satisfied: 

(1) one of nodes u or v is in S, or 

(2) K nodes are in S, denoted by S e = { s lf s 2 , . . . , s K ) , whose 
RTs T si contain link e, and for every pair of distinct 
nodes s it Sj 6 S ef it is the case that F e (T si ) * F e (T sj ) . 

[0089] Thus, the K-FRM Problem is defined as follows. Given are 
a constant K, a graph G{V,E) and an RT T v for every node v e V. 
Find the smallest subset S c V such that, for every link e 6 E, at 
least one of the following two conditions is satisfied. 

(1) one of nodes u or v is in S, or 

(2) K nodes are in S, denoted by S e = { s lf s 2t . . . , s K } , whose 
RTs T si contain link e, and for every pair of distinct 
nodes s^Sj e S ef it is the case that F e (T si ) * F e (T sj ) . 
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[0090] The K- FRM problem is a generalization of the LM problem 
defined above, and any instance of the LM problem can be 
represented as an instance of the K- FRM problem with K = 1. 
Because the LM problem is NP-hard, and because the lower bound of 
any approximation algorithm for the LM problem is #ln( | V\ ) , it will 
be stated without proof that the K- FRM problem is also NP-hard. 
Further, the lower bound of any approximation algorithm for the 
problem is 0(ln(|v|). 

[0091] To solve the K-FRM problem, it is mapped to an extended 
version of the set cover (SC) problem referred to as the Partial 
Multi-Set Cover (PMSC) problem. 

[0092] The PMSC Problem is defined as follows. Given are a 
constant K, a universe of elements Z, and the following two 
collections of subsets of Z: ¥ = { Y lf Y 2 , . . . , Y m ) and Q = 
{0i# Q 2 t • • • / Qm) • Each Yj 6 <Y contains at least K elements, and is 
disjoint from other members of Find the smallest collection S 

c such that for every Y j e % \\J(S)nY j \ z K, where U(5) = U Q65 Q is 
the union of the collection 5. 

[0093] The PMSC problem is more general than the SC problem. 
Every instance I(Z,Q) of the SC problem can be reduced to an 
instance of the PMSC problem by selecting K = 1 and defining the 
collection <Y = {{z}|Vz 6 z} in which every subset Yi contains a 
single element of Z. Thus, the optimal solution of the calculated 
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PMSC instance is also the optimal solution of the given SC 
instance, I(Z,Q). Therefore, PMSC is NP-hard, and it has a lower 
bound of at least ln(|z|). 

[0094] A greedy PMSC algorithm will now be described for solving 
the PMSC problem (see FIGURE 5 for pseudocode) . The greedy PMSC 
algorithm uses ideas similar to those employed by the greedy SC 
algorithm described above. More specifically, in each iteration, 

the greedy algorithm selects the most cost-effective set Q* E Q 

until all the sets in <Y are covered, as explained below. In the 
greedy PMSC algorithm, S £ is the collection of subsets that have 
been selected so far. We say that set Y j is "covered" by S if U(S) 
contains at least K elements from Y jt i.e., | U ( 5) n | ^ K. A 
variable Uj is associated with each set Yj e Y that specifies the 
number of "uncovered" elements in Yj-U(S) that still need to be 
selected to cover Yj . Thus, u ; - = 0 if Yj is already covered, 

otherwise Uj = fC- | U (S) nYj \ . We use u total = ^ J | u ; to represent the 

total number of uncovered elements that should be selected for 
covering all the sets Yj e Note that when u total = 0, the 

calculated selection S is a feasible solution. A value ^ is 
associated with every set Q i E Qj Q ± <f S, that is the total, over 
all uncovered sets Y jt of the number of uncovered elements in Yj 



that are also contained in Q lf and can be selected to cover Y j9 
Thus, n i = XJ =I min { w y>l& n (Y ( - \J(S))\} • Note that adding Q x to S causes 

u totai to decrease by n ± . Therefore, in each iteration, the greedy 
algorithm (see FIGURE 5) adds to 5, the "most cost-ef f ective" set 
Qi e Qr^ that maximizes the ratio 

[0095] The approximation ratio of the greedy PMSC algorithm is 
calculated using a technique similar to the one used for proving 
the approximation ratio of the greedy SC algorithm in Chavatal, 
supra. Consider the solution 5 returned by the greedy algorithm. 

Its cost is COST (S) = \S\ . Let Q* r be the set added to S in the r th 
iteration, and n r be the amount u tota2 is reduced due to the addition 
of Ql to 5. Since initially u total = mK, it follows that 

Z|S| * 
r x n r -mK . Let OPT be the optimal solution and let COST ( OPT) = 

\OPT\ denote its cost. 

[0096] In the r th iteration of the greedy algorithm, 

ZPI * 
n i 

n„ > — — fir — ■ From this, it follows that COST (S) = \S\ z 

r cosT(OPr) 
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t . Since > n=mK, a series of algebraic 

manipulations that will not be set forth here leads to the 
conclusion that the approximation ratio of the greedy PMSC 
algorithm is In (K) +ln (m) +1 , where m = Y. 

[0097] The K-FRM problem is solved by first reducing it to the 
PMSC problem, and then applying the greedy PMSC algorithm shown in 
FIGURE 5. In order to map a K-FRM instance involving the graph 
G{V,E) to a PMSC instance, the collections <Y and Q need to be 
defined. 

[0098] The collection Y contains m = |e| disjoint sets, where 
each set Y e e <Y results from a link e 6 E and contains at least it 
elements. The collection Q, contains n = \v\ sets, where each set 
Q v 6 CI is derived from the RT T v of node V. 

[0099] Let S e v be any subset of nodes and let S <= Q, be the 
corresponding collection of sets such that S = {0 V | V e £} . The 
reduction described herein guarantees that S is a feasible solution 
for the fC- FRM problem if and only if S is a feasible solution for 
the corresponding PMSC instance. Here, a feasible solution S for 
the K- FRM problem is one for which every link e e E satisfies one 
of the two conditions in the definition of the K- FRM problem set 
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forth above, while S is a feasible solution for a PMSC instance if 
for every Y j E % \U{S)nY j \ ;> K. 

[0100] To achieve the above-mentioned goal, Z includes two types 
of elements, where each type is used to ensure that one of the 
conditions in the definition of the K- FRM problem set forth above 
is captured. Let A v and A e denote the set of links in E that are 
incident on node v and endpoints of edge e, respectively. For 
capturing the first condition, K different elements that are 
included in both sets Y e and Q v are defined for every node v e V and 
every one of its outgoing links e E A v . Each element is 
represented by a triple <e,v, k> , for every 1 < k < K. Thus, 
selecting a set Q v ensures that all the sets Y e , e 6 A v , are 
covered. 

[0101] The second condition is reflected in a more 
straightforward manner. For every link e = (u,v) e E and each one 
of its adjacent links e' e A e , an element <e,e'> is defined and 
included in the set Y e . The element <e,e 7 > is also included in the 
set Q v of every node v such that link e' is the parent of link e in 
the tree T v , i.e., e' = F e (T v ) . Thus, selecting one of the sets Q v 
ensures that a single uncovered element in Y e is covered. 

[0102] In summary, for every e = (u,v) e E, the set Y e 6 <Y is 
defined as 

Y e = {<e, v, k>, <e, u, k> \ 1 <> k < x}U{ <e, e' > | e' e A e } 
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and for every node v e V, the set Q v is defined as 

q v = {<e,v,k>\e 6 A vt l <l k z iC}U{ <e, F e ( T v ) > \ e 6 T v A e <? A v ) 
[0103] Suppose the greedy algorithm returns collection S as the 
solution to the above PMSC instance. Then, the solution to the 
original K- FRM problem is computed as S = (v|Q v e 5). Clearly, 
since S covers every set Y e e Of, every link e e E either has an 
endpoint in S, or it appears in at least K RTs of nodes in S, with 
distinct parent links. Thus, S is a feasible solution to the K-FRM 
problem. Further, since the mapping between S and S does not alter 
the cost of the solutions (by virtue of the fact that, as stated 
above, the approximation ratio of the greedy PMSC algorithm is 
In (K) +ln {m) ) , it follows that the cost of solution S is within a 
ln(JC) +ln( \E\ ) +1 factor of the optimal solution to the K- FRM 
problem. 

[0104] Note that the greedy algorithm takes O ( | V| 3 ) steps to 
solve the PMSC instance corresponding to the K- FRM problem, since 
\Q V \ = 0(\V\) and |QJ = |V| . 

[0105] Once a set S of IC-fault resilient monitoring stations has 
been selected, initially each link e = (u,v) is monitored by the 
station s e S such that e £ T s and c s u +c s v is minimum. The NOC 
keeps track of failed network links in the variable X. When the 
NOC detects the failure of a network link f, it adds the link to X. 
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[0106] Further, for each link e currently being monitored by a 
station s such that P SfG contains the failed link f, the NOC 
computes a new station s' for monitoring e. The new station s' for 
e, in addition to satisfying the conditions e 6 T s and c 8 . tU +c a , fV is 
minimum, also needs to satisfy the condition that P StG nX = 0. This 
ensures that the routing path from s' to e does not contain any of 
the failed links. Note that since S is K-fault resilient, at least 
K disjoint routing paths exist from a station in S to each link e 
not adjacent to a station in S. Thus, each active network link can 
be continually monitored by some station in S as long as the number 
of failures |x| < K. To monitor both the fault and delay of a link 
e = (u,v), the station s monitoring e sends the probe messages 
m(s,v,h su ) and m (s, v, h s v ) , as described above. 

[0107] From the above, it is apparent that the two-phased 
monitoring methodology described herein ensures complete coverage 
of the network in the presence of link failures and ideally 
minimizes the monitoring overhead on the underlying production 
network. In the first phase of the approach described herein, the 
locations of a minimal set of monitoring stations was computed such 
that all network links are covered, even if some links in the 
network were to fail. Subsequently, in the second phase, the 
minimal set of probe messages to be sent by each station was 
computed such that the latency of every network link can be 
measured, and faulty network links can be isolated. 



[0108] Unfortunately, both the station selection and the probe 
message assignment problems are NP-hard. However, polynomial -time 
greedy approximation algorithms described in response thereto 
achieve close to the best possible approximations to both the 
station selection and the probe message assignment problems. 

[0109] Although the present invention has been described in 
detail, those skilled in the art should understand that they can 
make various changes, substitutions and alterations herein without 
departing from the spirit and scope of the invention in its 
broadest form. 
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